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original similarity values between the objects to be partitioned; the 
second rank ordering of the object pairs is obtained from the 
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APPROXIMATE EVALUATION TECHNIQUES FOR TFE MAX HIERARCHICAL 

CLUSTERING PROCEDURE 



Lawrence Filbert 



Abstract 



This paper presents a simple technique for testing the hypothesis 
that a hierarchical sequence of partitions constructed by the max 
method could have been obtained solely on the basis of ••noise/* The 
test procedure involves comparing a rank-order goodness -of- fit 
measure (Goodman- Kruskal y statistic) to the tabled percentiles 
obta ine d l i ' Olii aii approximate amil atiye-perrnotation distribution of 
the measure. One of the rank order ings of the object pairs used in 
defining y is derived immediately from the given similarity values 
between the objects to be partitioned; the second rank ordering of 
the object pairs is obtained from the partition hierarchy itself. 
The tested hypothesis is simply that the given set of similarity 
values have been assigned randomly to the object pairs. 
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APPROXIMATE EVALUATION TECHNIQUES FOR THE MAX HIERARCHICAL 

CLUSTERING PROCEDURE 

1. imODUCTION 

In recent years a substantial number of applied researchers 
have attempted to use the max hierarchical clustering scheme as a 
general data analysis technique. Representative applications may 

be found in Miller [11] , Anglin [1] , s^Vfli^ ^nH Sn^^ th [131 ^ 

Johnson [8], and Hubert [5,6]. Variously called the max [8], 
complete-link [13] , furchest-neighbor [9] , or hierarchical linkage 
technique [10] , this particular clustering procedure has been used 
primarily as a descriptive device since there is no standard way 
of statistically evaluating the adequacy of the obtained sequence 
of partitions. Although the lack of an elegant methodology is 
understandable given the combinatorial problems posed by the method, 
approximate statistical procedures can be developed now in terms of 
randomization and sampling theory until the more exact assessment 
methods become available in the future. 

As a way of presenting a brief summary of what the max clustering 
method does, suppose S is a set of n objects labeled Oj^,...,o^ and 
{s -^j } is an n by n matrix containing similarity measures between 
all objects o^ and o^.''' For a rather weak initial requirement, it 
is assumed that the elements of is- A satisfy three constraints: 

(i) Symmetry: s^^ = s^^ for all o^.o^eS; 

(ii) Positivity: s.. > 0 for all o.,o.eS; 

(iii) Nullity: s. . = 0 for all o. = o. . 

J ^ J 
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In the discussion below, similarity v.ill be treated as a primitive 
term; the interested reader can consult Sokal and Sneath [13] , or 
Jardine and Sibson [7] for a more thorough presentation. 

The general aim of most hierarchical clustering techniques is to 
produce an ••optimal" sequence of partitions of the basic object set 
S. More precisely, the commonly used clustering methods produce a 
sequence -of partitions (£q , > . > ^A^^ i) with^ e following properties: 

(i) Iq is the trivial partition containing an object 
class for each element in S; 

(ii) .j^ is the trivial partition containing a single 
all-inclusive object class; 

(iii) is an immediate refinement of l^^^^ 0 < k < n-2. 

It is possible to characterize inductively one general paradigm 
for the construction of a partition sequence that will include most 
of the familiar clustering methods. Suppose the level k partition, 

has been obtained and some real -valued function f defined on the 
Cartesian product of the power set P(S) is evaluated for all pairs of 
subsets defining That pair of subsets at level k minimizing 
(or in some other way optimizing) the function f are then united to 
form a new object class in the partition ^^-^^ All remaining subsets 
in are merely transferred to 

As an illustration, the max method and an alternative min 
method [8] are obtained by the following two interpretations of f : 



if L^,L^, eP(S), then 




The max method uses and attempts to minimize subset diameters ; 
the min method uses and minimizes a standard topological measure 
of similarity between subsets. 

Since borh th^ max aiid Llie min Leduixques depend only upon the 
rank order of the similarity values, either of these two procedures 
can be interpreted as a way of reranking the object pairs. Specifically, 
the partition rank xor each object pair {0^,0.} is defined as the 
level at v^ich that pair first belongs to the same subset in a 
partition. Symbolically, the partition rank for the pair {0^,0^} can 
be expressed as 

min {k|{o^,Oj> belongs to the same subset in 

By comparing the set of all partition ranks to the original similarity 
ranks , the adequacy of the partition hierarchy in capturing the 
structure underlying the matrix {s^.} can be assessed. The measure 
used in the following sections for quantifying this agreement is the 
Y statistic developed by Goodman and Kruskal [4] ; although this 
choice is somevAiat arbitrary, the y statistic has a number of 
desirable properties with regard to probabilistic interpretation in 
the case of tied ranks that the more standard measures of rank 
correlation do not possess. 



2. COMPARISON OF THE MIN AhD THE MAX MEIHODS 



Jardine and Sibson [7] provide a very strong axiomatic argument 
for the use of the single-link (min) as opposed to the co!ig>lete-link 
(max) clustering procedure. Although their presentation is 
mathematically elegant, a number of other researchers in the field, 
notably the "Australian school** (see, for exanple [14]), have 
criticized the min method on pragmatic grounds. As a way of introducing 
a more extensive discussion of the max method per se, this section 
will presait one siinple illustration to point out the differences 
between the min and the max technique in terms of the y statistic* 

In the exan5)le given in a later section an object set with 
cardinality 9 was defined with 30 distinct similarity values assigned 
to 36 object pairs. Since both the min and the max procedures 
depend solely upon the rank order of the similarity values, this is 
equivalent to assigning 30 distinct ranks to the 36 object pairs. 
Using this fixed set of ranks, 1000 permutations were tandOTily 
selected with replacement frm the set of all possible permutations 
of the object pairs. For each permutation, the min and the max 
hierarchies were obtained along with the two corresponding y values. 

It is obvious from the cumulative distribution of y given in 
Table 1 that, on the average, the max procedure provides the more 
adequate represaitation of the original similarity values. This 
result holds true in general and is not an artifact of the cardinality 
of the object set used in this example. 



Table 1 here 
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The accuracy of the distribution obtained with a sample of 1000 
can be evaluated in a number of ways. First, by using tolerance 
intervals we can say that with probability greater than .999, 99 
percent of the COTiplete permutation distribution is less than the 
maximum observation (see Table 5 in [2]). Thus, if a y value greater 
than .82 were obtained for a max hierarchy based upon similarity 
values of the form used in constructing Table 1, there would be little 
doubt as to the significance of the result. In particular, if the 
null hypothesis is one of randomness in the assignment of similarity 
values to the object pairs, then each permutation of the object pairs 
should be equally likely to occur a priori . Consequently, if a y 
value larger than .82 were calculated for the actual data using the 
max hierarchy, this particular null hypothesis could be rejected at 
a significance level close to .01. 

A second way of assessing the accuracy of the saii5)liiig procedure 
is in terms of Itolmogorov-Staimov theory. Using a sample size of 
1000 the following statement can be made conservatively since the 
underlying distribution of y is discrete: with probability at least 
.99, the maximum absolute deviation between the sample and the 
population cumulative distribution function is less than .05 (see [3], 
p. 81). Thus, if the y value obtained for the real data lies at the 
1 - a percentage point of the permutation distribution, the null 
hypothesis of rane..raiess can be rejected conservatively at a 
significance level of about a + .05. 

Obviously, these measures of accuracy for the san5)ling procedure 
could be improved i.K>n further; for practical purposes, however, 

ERIC 
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a sample size of 1000 was used in deriving the tables given in the 
next section for n = 4 through 16. For larger values of n, a normal 
approximation based upon an estimated mean and variance of the 
pennutation distribution appears to be adequate. 
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3. TABLES FOR ASSESSING THE RESULTS FROM THE MAX 
CLUSTERING PROCEDURE 

Assuming that all similarity values are untied, Table 2 presents 
selected percentage points of the sansple cumulative distribution 
functions for y for n = 4 through 16. As mentioned in the previous 
section, these distributions are based upon a sample size of 1000 
and should provide fairly reasonable approximations to the population 
distribution functions. 

Table 2 here 

For small values of n the san5>le permutation distributions are 
extremely peaked although they are almost perfectly symmetric. The 
mean values and the variances decrease fairly regularly as n increases; 
in fact, by merely extrapolating from the means and variances 
presented in Table 3, reasonable approximations to object sets larger 
than 16 could be obtained. Instead of extrapolating, however, the 
estimated means and variances for n = 17 through 25 were obtained 
with saii5)les of 200 permutations and may be used as parameters of an 
approximating normal distribution. 

Table 3 here 

For moderate n, the normal distribution prqjirides a fairly 
adequate approximation to the underlying sample permutation 
distribution. For example. Table 4 illustrates the close correspondence 
with the nonnal vAien n « 14. If n is small, however, the sample 
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permutation distribution is considerably more peaked than the 
corresponding normal distribution. 



Table 4 here 
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4. EXAMPLE 

There is a basic problem with the use of Table 2 idien tied 
similarity values are present, since in a strict sense the tabled 
percentage points are then no longer ^^ropriate. Although a 
conqplete discussion of the effect of ties would be valuable, 
aiiaiy LicdHy-^e-task seem iiH)ossible, We can, however, present 
an exaasple of vihat happras to the permutation distribution v4ien ties 
do occur. c 

As a way of discussing the problem of ties and presenting an 
illustration of the use of Table 2, the data collected by 9iepard [12] 
on the confusability of nine colors is ideal. The basic nine by 
nine similarity matrix in 9iepard*s paper consisted of the conditional 
probabilities of confusing one colored circle with eight other 
possibilities, each of \Adch had the same constant red hue but 
different values for brightness and saturation. To make the 
similarity measures symmetric, the values given in Table 5 were 
obtained by adding the symmetric elements from Shepard^s table and 
subtracting the result frcwi 1.00. 



Tables 5 and 6 here 

In addition to the similarity values between colors. Table 5 
also lists the partition ranks for the object pairs obtained from 
the max partition hierarchy given in Table 6. The y value between 
the partition ranks and the ordered similarity values turned out 
to be .687 and apparently represents a substantial value. To test 
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whether this y index is large enough to reject the null hypothesis of 
a random ordering, 1000 permutations of the object pairs were obtained 
using the similarity values illustrated in Table 5, The percentage 
points for this amxlative distribution were given previously in 
Table 1 and imply that the null hypothesis can be rejected at a 
significance level of about .01, subject to the variaV niuced 
by the sampling process itself. 

Although obtaining a separate permutation distribution for each 
similarity matrix is the most ideal alternative, this recommendation 
defeats the overall usefulness of the percentage points given in 
Table 2. Most of the time, however, the tabled A^lues will be 
sufficient to convince the researcher that he is not obtaining a 
hierarchy based upon noise alone. This can be done merely by breaking 
the ties in the original similarity values to obtain the largest y 
value and then a second time to obtain the minimum value. These 
bounds are on the y values that can be obtained from the partition 
hierarchy assuming the similarity values are untied; but in addition, 
because of the way in which y is defined it is also true that the 
original y calculated for tied similarity values will lie between 
these two bounds. 

The upper and lower bounds on y are rather easy to obtain 
without a complete evaluation of all the possible ways in vfliich ties 
may be resolved. Each individual set of tied similarity values can 
be reordered to give a minimum y value with respect to its own 
subset of partition ranks and then a second time to give a maximum 
Y value. When used together, these local operations construct 
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overall orderings of the similarity values that lead to the global 
upper and lower bounds for the y index. 

Table 5 presents the orderings of the similarity values based 
upor :M oe nwo local operations. The miniinum y is .663 and the 
maximum y is .688. Since the minimum bound is at a high percentage 
point in Table 2, the null hypothesis still appears untenable. 

In general, if the maximum bound is not sufficient to reject 
randomness at a reasonable significance level, then a permutation 
distribution based upon the unique foim of the similarity values 
will lead to the same conclusion (subject, of course, to the sampling 
variability in the permutation distributions). Similarly, if the 
minimum bound is sufficient to reject randomness, then the more 
exact permutation distribution will imply rejection also. Ftowever, 
if either of these two conditions does not occur, using the more 
exact permutation distribution based upon the exact similarity values 
is probably the only reasonable procedure to follow. 



Table- 1. S^^^^LE QMIL/XTIVE PliRMUTATION DISTRIBUl'IONS OF y FOR 
NINE OlUECrS [N = 1000, 30 DISTINCT SIMILMIITY VALUES] 



• 

Y 


Ciunulativc 
Mm 


proportions 
Max 


.02 


.002 


.000 


.06 


.006 


.000 


.10 


.016 


.000 


.14 


.032 


.000 


.18 


.065 


.003 


.22 


.122 


.006 


.26 


.202. 


.021 


.30 


. .314 


.058 


.34 


.429 


.120 


.38 


.570 


.223 


.42 


.706 


.350 


.46 


.801 


.485 


.50 


.862 


.654 


.54 


.919 


.784 


.58 


.956 


.882 


.62 


.985 


.942 


.66 


.996 


.980 


.70 


.998 


.990 


.74 


1.000 


.997 


.78 


1.000 


.999 


.80 


1.000 


.999 


.82 


1.000 


1,000 



^lean of .37; standard deviation of .12 . 



Mean of .47; standard deviation of .10 . 



Tabic 3- RELATIONSHIPS BURliKN TIIE NUMBER OF OBJECTS IN S AND THE 
SAMPLE MEAN AND STANDARD DEVIATION OF y 







Mean y 


Standard Deviation y 


4 


.818 


.1665 


5 


.706 


.1606 


6 


.613 


.1407 


7 


.553 


.1222 


8 


.502 


.1119 


9 


.457 


.0990 


IC 


.432 


.0908 


11 


.400 


.0828 


12 


.376 


.0749 


13 • 


.353 


.0724 


14 


.343 


.0634 


15 


.324 


.0635 


16 


.307 


.0601 


17 


.299 


.0554 


18 


.288 


.0563 


19 


.277 


.0530 


20 


.265 


.0481 


21 


.254 


.0457 


22 


.245 


.0402 


23 


. .235 


.0461 


24 


.230 


.0426 


25 


.229 


.0387 



^lean and st<andard deviation based upon samples of 1000 
through n = 16; for larger n, sample sizes are 200. 
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Table 4. NORM/U APPROXIMATION TO 'nil: C&WLE) PERMUTATION DISTRIBUTION 
OF Y [n = 14] 



Sample Y Sample percentage Standardized y 



.195 


.010 


.195 


.209 


, .020 


.213 


.218 




.224 


.227 


1 .040 


.232 


.236 j 


.050 


.239 


.245 


.060 


.244 


.249 ] 


r .070 


.249 


.255 / 


' .080 


.254 


■ 1 
.259 1 


.090 


.259 


.2621 


.100 


.260 


.289 | 


.200 


.289 


.311 1 


.300 


.310 


.326 


V .400 


.327 


.345 


.500 


.343 


.361 


.600 


.359 


.377 


.700 


.376 


.395 


.800 


.396 


.422 


.900 


.426 


.425 


.910 


.428 


.430 


.920 


.432 


.433 


.930 


.436 


.440 


.940 


.441 


.444 


.950 


.447 


.451 


.960 


.454 


.460 


.970 


.462 


.471 


.980 


.473 


.488 


.990 


.490 



Based upon the sample mean and standard deviation given in 
Table 3 for n = 14.. 
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Table 6. PAI^TII IOM lIinR/Mm' OiHyMNUD n^OM SIIEPARD'S DATA USING 
' im MAX MEniOD 



Level Partition 

1 {{1,2},{3},{4},{5};{7},{8},{9}} 

2 {{1,2}, {4, 7}, {3}, {5}, {6}, {8}, {9}} 

3 {{1,2}, {3, 5}, {4,7}, {6}, {8}, {9}} 

4 {{1,2}, {3, 5}, {4,7}, {6, 8}, {9}} 

5 {{1,2}, {3, 5}, {6, 8}, {4, 7,9}} 

6 {{1,2, 3, 5}, {6,8}, {4, 7, 9}} 

7 {{1,2, 3, 5,6, 8}, {4, 7,9}} 
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FOOTNOTES 

Lavrrence Hubert is Assistant Professor, Department of Educational 
Psychology, University of Wisconsin, Madison, V/isconsin, 53706. 

■^To be consistent with Johnson [8], the term "similarity" is 
used rather than "dissimilarity." 
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